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WHAT IS CLAIMED IS: 



V I 1 \ A document classification system for 
classifying d document based on contents of the document 
of which contents contains a plurality of items, said 
document classification system comprising: 
10 inputting means for inputting document data 

corresponding to\ the document data; 

designating means for designating at least one 
of the items contained in the document input by said 

\ 

inputting means; \ 
15 convert ing\ means for converting the document 

data into converted data so that the converted data 

contains only data corisesponding to the item designated 

by said designating means; and 

classifying means for classifying the document 
20 by using the converted data produced by said converting 

means . 



25 
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2, V The document classification system as 
claimed in claim 1, wherein said classifying means 
includes document vector producing means for producing a 
feature vector! representing a feature of the converted 
data so as to classify the document in accordance with 
the feature vector produced by said document vector 
producing means 



10 



3. The document classification system as 
claimed in claim 1 , Wherein said converting means 
includes separation sign inserting means for inserting a 
15 predetermined sign between sets of data corresponding to 



cV 



the items so as to facilitate separation of each data 
corresponding to each iVtem in the converted data. 
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4 . A document cl 
classifying a document bas 
of which contents contains 
document classification met 




fication method for 
contents of the document 
lurality of items, said 
comprising the steps of: 



m 
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ihputting document data corresponding to the 
document dalla; 

designating at least one of the items 
contained inl the document input in the inputting step; 

converting the document data into converted 
data so that Ithe converted data contains only data 
corresponding\ to the item designated in the designating 
step; and 

clas^sifying the document by using the 
converted data 1 produced in the converting step. 




The dobiom^t classification method as 

I 

claimed in claim 4, wherein the classifying step 



includes the step of producing a feature vector 
representing a feature of the converted data so as to 
classify the documei^t in accordance with the feature 
vector . 
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iThe document classification system as 
claimed in claim 4, wherein the converting step includes 
the step of inserting a predetermined sign between sets 
of data corresp\3ndj^n^^o the items so as to facilitate 
separation of each^alfa corresponding to each item in 
the converted data, 



A processor readable medium storing 
progr^ code V^ausing a computer to classify a document 
based on contepts of the document of which contents 
contains a plurality of items, comprising: 
15 firsti program code means for inputting 

document data corresponding to the document data; 

secondl progreun code means for designating at 
least one of the\items contained in the document; 

third program code means for converting the 
20 document data int^. converted data so that the converted 
data contains only\data corresponding to the item 



designated by the second program code means; and 

fourth program code means for classifying the 
document by using the\:onverted data produced by the 
2 5 third program code means\^ 
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8. The processor readable medium as claimed 
in claim 7, \ wherein the fourth program code means 
includes finth program code means for producing a 
feature vector representing a feature of the converted 
data so as to\ classify the document in accordance with 
the feature velctor. 



10 

9. Th4 processor readable medium as claimed 
in claim 7 , wherein the third program code means 
includes sixth pragreim code means for inserting a 
predetermined sign\between sets of data corresponding to 
15 the items so as to facilitate separation of each data 

\ 

corresponding to each item in the converted data. 



20 

10. A document classification system for 
classifying a document according to contents of the 
document, said document classification system 

V 

comprising: ^ 
2 5 input means for inputting document data of the 



\ 
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docimen ; 

analyzing means for analyzing the document 
data \ so as to obtain analysis information; 

vector producing means for producing a 
document feature vector with respect to the document 
data bpsed on the analysis information; y 

transforming function calculating means for 
calculating a representation transforming function used 
for projiecting the document feature vector onto a space 
10 in which\ similarity between the document feature vectors 
is reflected; 

ivector transforming means for transforming the 
document f^^ature vector by using the representation 
transforming function; ^ 
15 classification means for classifying the 

document based on similarity between the document 
feature vectors transformed by the vector transforming 
means; and \ ^ 

classification result storing means for 
20 storing a result\of classification performed by the 
classification means. 
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The document classification system as 
claimed in tlaim 10, further comprising inner product 
calculating Weans for calculating an inner product 
between the document feature vectors, wherein said 
representation transforming function calculating means 
calculates thE representation transforming function by 
using the inner product. 
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12. The document classification system as 
claimed in claim\ll, further comprising document 
similarity information setting means for setting 
document similarirty setting information including data 
representing an author of the document and a date of 
production of the nocument , wherein said representation 
transforming function calculating means calculates the 
representation tran\sf orming function by using the inner 
product and the doc^iment similarity information. 
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la. The document classification system as 
claimed in claoim 10, further comprising: 

vectdr storing means for storing the document 
feature vector produced by said vector producing means; 
and 

transforming function storing means for 
storing the representation transforming function 
calculated by said representation transforming function 
calculating means! 



14. The document classification system as 
15 claimed in claim 10,\further comprising vector 

correcting means for correcting the document feature 
vector before the document feature vector is transformed 
by said vector transforming means, a correction being 
performed by processing one of the document feature 
20 vector and a feature dimension constituting the document 
feature vector in accordan\^ with a rule established by 
characteristics of words extracted by said analyzing 
means . 
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.5. The docximent classification system as 
claimed in \claim 14, further comprising transforming 
function correcting means for correcting the 
representation transforming function calculated by said 
transforming; function calculating means when the feature 
dimension is changed due to a correction of the document 
feature vector by said vector correcting means so that 
the document feature vector is transformed by said 
vector transforming means in accordance with the changed 
feature dimension. 



16. The \dociiment classification system as 
claimed in claim 10)^ further comprising: 

transforming function correction instructing 
means for sending an instruction regarding a process to 
be applied on a feature dimension of the representation 
transforming function; and 

transforming function correcting means for 
correcting the representation transforming function 
based on a content of the instruction sent from said 
transforming function correction instructing means. 
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17. \The document classification system as 
claimed in clalrp 16, wherein the process indicated in 
the content of t?}ie Instruction is performed by using 
data of an arbitrary document vector. 



18. The document classification system as 
10 claimed in claim 16, Wherein the process Indicated in 

the content of the instruction is performed by using the 
document feature vecto\rs 



15 



19, The document classification system as 
claimed in claim 16, wherein the process indicated in 



the content of the instruction is performed by using the 



20 



analysis Information obtained by said analyzing means 
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20x The document classification system as 
claimed in claim 16, wherein the process indicated in 
the content of the instruction is performed by using the 
result of classification stored in said classification- 
result storing means. 



10 21. The dbcument classification system as 

claimed in claim 10,\further comprising: 

an initial cluster centroid designating means 
for designating an ini^tial cluster centroid; and 

initial clusTcer centroid registering means for 
15 registering the initial\ cluster centroid designated by 
said initial cluster cei^troid designating means, 

wherein said dpLassif ication means classifies 
the document in accordance with the initial cluster 
centroid registered by safd initial cluster centroid 
20 registering means. 



22. The docximent classification system as 
25 claimed in claim 21, wherein \he initial cluster 
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centroid designated by said initial cluster centroid 



designating mea 



s is arbitrary document vector data. 
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23. The\ document classification system as 
claimed in claim 2lV wherein the initial cluster 
centroid designated Vby said initial cluster centroid 
designating means is \ the document feature vector. 
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24. The document classification system as 

\ 

claimed in claim 21, wherein the initial cluster 

\ 

centroid designated by said initial cluster centroid 
designating means is the analysis information obtained 
by said analyzing means. 
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25. The document classification system as 
claimed in claim 21, wherein the initial cluster 



centroid designated by said initial cluster centroid 
designating\ means is the result of classification stored 
by said classification-result storing means. 



26. A document classification method for 
classifying a document according to contents of the 



10 document, said qocument classification method comprising 
the steps 



, said docu 
>S of : \ 



inputting document data of the document; 
analyzing the document data so as to obtain 



analysis information ; 




15 producing a dqcuhjent feature vector with 

respect to the dociime\rit c^kt^jarksed on the analysis 
information; 

calculating d representation transforming 
function used for pro jeoting \:he document feature vector 
20 onto a space in which similarity between the document 
feature vectors is reflected; 

transforming theXdocument feature vector by 
using the representation transforming function; 

classifying the docWent based on similarity 
2 5 between the document feature vetators transformed in the 
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step of transforming; and 

srtoring a result of classification performed 
in the step I of classifying. 



10 



27. \ The document classification method as 
claimed in cla^Lm 26, further /comprising the step of 
calculating an \inner /&K:oduct between the document 
feature vectors! wherein\the representation transforming 



function is calculat 




ing the inner product, 



15 
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28. The document classification method as 
claimed in claim 27, further comprising the step of 
setting document similarity setting information 



including data representing an author of the document 



. \ 



and a date of production of the document, wherein the 



representation transf ormj^ng function is calculated by 



a: 



using the inner product and the document similarity 
information. \ 
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29. The document classification method as 
claimed in dlaim 26, further comprising the steps of: 

storing the document feature vector produced 
in the step ^f producing said document feature. vector; 
and 

stoting the representation transforming 
function calculated in the step of calculating said 
representation \transf orming function. 
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30. The\ document classification method as 
claimed in claim 26\ further comprising the step of 
correcting the document feature vector before the 
document feature vector is transformed in the step of 
transforming, a correction being performed by processing 
one of the document feature vector and a feature 
dimension constituting^ the document feature vector in 
accordance with a rule \established by characteristics of 
words extracted in the step of analyzing. 
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The dociiment classification method as 



10 
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claimed in\ claim 30, further comprising the step of 
correcting \the representation transforming function 
calculated lin the step of calculating when the feature 



dimension i 
feature vec 
document fe 
transf ormin 
dimension . 



3 changed due to a correction of the document 
:or in the step of correcting so that the 
is transformed in the step of 
brdance with the changed feature 



15 claimed in 



be applied 




document classification method as 
, further comprising the steps of : 

truction regarding a process to 
dimension of the representation 



insi 



transforming function; and 



\ 



correcting the representation transforming 



\ 



function based on a content of the instruction sent in 
the step of sending. 
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33. \ The document classification method as 
claimed in cldim 32, wherein the process indicated in 
the content of \ the instruction is performed by using 
data of an arbitrary document vector. 




34. The '^iodiifn^t classification method as 
claimed in claim 32 ,\ vi^h^kjein the process indicated in 
the content of the ins^ruc\ion is performed by using the 
document feature vectors 




20 



35. The document classification method as 
claimed in claim 32, wherein tke process indicated in 
the content of the instructi^n^^^i^ by using the 

analysis information obtained by^ said analyzing means. 
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361. The docviment classification method as 
claimed in claim 32, wherein the process indicated in 
the content of the instruction is performed by using the 



T 

ids 



result of classification stored in said classification- 
result storing means. 



10 37. The document classification method as 

claimed in claii\ 26, further comprising the steps of: 



registerh\ig the initial cluster centroid 



design^tjing an initial cluster centroid; and 

designated in the \s^Dkp of designating, 
15 wherein -^h^ ^oct^ment is classified in 

accordance with the\5m^tial cluster centroid registered 
in the step of register\ing . 



20 



38. The document classification method as 
claimed in claim 37, wherein the initial cluster 
centroid designated in the step of designating is 
25 arbitrary document vector data. 
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39. \ The docximent classification method as 
claimed in cla^m 37, wherein the initial cluster 
centroid desigri^ted in the step of designating is the 
document feature vector. 



10 



40. The document^ classification method as 
claimed in claim 37 , \whereMi/the initial cluster 



centroid designated in the ^s\ep of designating is the 
analysis information obtainedXin the step of analyzing, 



15 



41. The document \classification method as 
claimed in claim 37, wherein Nthe initial cluster 
centroid designated in the step of designating is the 
20 result of classification storedAin the step of storing. 
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Y 42 1 A processor readable medium storing 
program code Icausing a computer to classify a document 
according to contents of the document, comprising: 

first progreim code means for inputting 
document data of the documents- 
second program code means for analyzing the 
document data so as to obtain analysis inf oirmation; 

third program code means for producing a 
document feature Wector with respect to the document 
data based on the\analysis information; 

fourth program code means for calculating a 
representation transforming function used for projecting 
the document f eature\ vector onto a space in which 
similarity between tqe document feature vectors is 
reflected; 

fifth progreim code means for transforming the 
document feature vecto:^ by using the representation 
transforming function ; 

sixth program bode means for classifying the 
document based on similamty between the document 
feature vectors transforme^d by the fifth program code 
meaiis ; and 

seventh program cbde means for storing a 
result of classification perlformed by the classification 
means . 
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43. rrhe processor readable medium as claimed 
in claim 42, further comprising eighth program code 
means for calculating an inner product between the 
document feature w^ectors , wherein the representation 
transforming function is calculated by using the inner 
product . 



10 

ru 

go 44. The proicessor readable medium as claimed 



in claim 43, further comprising ninth program code means 
for setting document similarity setting information 
including data representing an author of the document 
15 and a date of production \of the document, wherein the 
representation transforming function is calculated by 
using the inner product ari^d the document similarity 
information . 



20 



45. The processor readable medium as claimed 
in claim 42, further comprising 
25 tenth program code means for storing the 




10 



15 



-94- 

document feature vector produced by the third program 
code means; \ and 



eleventh program code means for storing the 



representation transforming function calculated by the 
fourth prograin code means 




46. Th^ processor readable medium as claimed 
in claim 42, further comprising twelfth program code 
means for correctin\g the document feature vector before 
the document feature, vector is transformed by the fifth 



\ 



program code means, a correction being performed by 
processing one of the\^ocument feature vector and a 
feature dimension const^ituting the document feature 
vector in accordance with a rule established by 
characteristics of words\extracted by the second program 
code means . 
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47. The processor readable medium as claimed 



in claim 46, further comprising^thirteenth program code 
25 means for correcting the representation transforming 
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function callculated by the fourth program code means 
when the feature dimension is changed due to a 
correction of the document feature vector by the twelfth 
program code! means so that the document feature vector 
is transformdjl by the fifth program code means in 
accordance with the changed feature dimension. 



48. The processor readable medium as claimed 

\ 

in claim 42, further comprxsing: 

fourteenth program code means for sending an 
instruction regar\ing a process to be applied on a 
15 feature dimension of the representation transforming 
function; and 

fifteenth Vprogram code means for correcting 
the representation transforming function based on a 
content of the instruction sent by the fourteenth 
20 program code means. 
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49. "me processor readable medium as claimed 
in claim 42, further comprising: 

sixteenth program code means for designating 
an initial cluster centroid; and 
5 seventee^nth program code means for registering 

the initial cluster centroid designated by the sixteenth 
program code means , 

wherein tlie document is classified in 
□ accordance with the j^nitial cluster centroid registered 

fU 10 by the seventeenth prfagram code means. 
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